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Abstract 

Importance sampling (IS) is a variance reduction method for simu- 
lating rare events. A recent paper by Dupuis, Wang and Sezer (Ann. 
App. Probab. 17(4):1306- 1346, 2007) exploits connections between IS 
and subsolutions to a limit HJB equation and its boundary conditions 
to show how to design and analyze simple and efficient IS algorithms for 
various overflow events for tandem Jackson networks. The present paper 
uses the same subsolution approach to build asymptotically optimal IS 
schemes for stable open Jackson networks with a tree topology. Cus- 
tomers arrive at the single root of the tree. The rare overflow event we 
consider is the following: given that initially the network is empty, the 
system experiences a buffer overflow before returning to the empty state. 
Two types of buffer structures are considered: 1) A single system-wide 
buffer of size n shared by all nodes, 2) each node i has its own buffer of 
size fan, Pi G (0, 1). 

1 Introduction 

Importance sampling (IS) is a method for simulation of rare events. It is 
used in many applications including simulation of communication systems, 
computation of credit risk and pricing of financial derivatives. The idea in 
IS is to change the sampling distribution (and modify the Monte Carlo esti- 
mator accordingly) to reduce estimator variance. Queuing processes are basic 
stochastic models that are commonly used in a wide range of application ar- 
eas. The simplest type of queuing processes are Jackson networks, in which 
the arrival and service times at the nodes of the network are assumed to be 
independent and exponentially distributed with constant rates. 



In the present paper we build an IS algorithm, which is optimal in a certain 
asymptotic sense (see Section [3]), to simulate buffer overflows of stable open 
Jackson networks with a tree topology. The system is stable in the sense that 
the average service rate at each node is faster than the average arrival rate to 
that node. Customers arrive at the single root of the tree. The rare overflow 
event we consider is the following: given that initially the network is empty 
the system experiences a buffer overflow before returning to the empty state. 
Two types of buffer structures are considered: 1) A single system-wide buffer 
of size n shared by all nodes 2) each node i has its own buffer of size Pin, 
Pi € (0,1). 

To construct our optimal IS algorithms we use an optimality result from 
|16j which was obtained using the optimal control/subsolution approach to IS 
of [HI [3j HI O [5] . This result states that to construct optimal IS algorithms for 
the simulation of a wide range of buffer overflow events of any stable Jackson 
network it is sufficient to build appropriate smooth subsolutions to a Hamilton 
Jacobi Bellman (HJB) equation and its boundary conditions (these are given 
in (J7J) in the context we study in the current paper). This HJB equation and 
the boundary conditions are the main tools of the optimal control/subsolution 
approach and are derived from an optimal control representation of the IS 
distribution construction problem. 

The main contribution of the present paper is a recursive algorithm which 
takes as input the parameters of an arbitrary Jackson network with a tree 
topology and constructs a smooth subsolution to the HJB equation and its 
boundary conditions given in ([7]). The constructed subsolution is of the form 
of a smoothed minimum of afflne functions, as was the case in previous works 
using the subsolution approach, e.g. [121 [16]. The quantities that appear in the 
subsolution (and hence the algorithm) have simple heuristic interpretations 
as effective utilities and rates of nodes in the system. They are "effective" 
in the sense that they depend on whether a node is empty or nonempty. 
These concepts are explained in detail in subsection 14.11 The main results of 
the paper are Lemmas 14.21 and 16.11 which prove that the subsolutions arising 
from the effective rates and utilities satisfy all the conditions of the general 
optimality theorem in [16] for both type of buffer structures that we will be 
studying in this paper. Numerical results in Sections [5] and [6] demonstrate the 
practical usefulness of the resulting IS algorithms. 

Since the initial writing [TS] of the present paper a recent paper by Dupuis 
and Wang [7] appeared that treat the IS problem for any stable Jackson 
network using the subsolution approach. The relation between the results in 
the current paper and those in [7] is discussed in Section [7J 

There is a tremendous amount of work on the IS of queueing networks, 
which include, [13 UHl HH El [lOl (H HH El [T] . The problem of constructing 
IS algorithms for buffer overflow of queuing networks was first posed for the 
simple two node tandem network in [TT], which also proved that a static 
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large deviations based change of measure is asymptotically optimal for certain 
parameter values of the system. An asymptotically optimal IS algorithm with 
optimality proofs for buffer overflow of stable tandem Jackson networks was 
first developed in [12] using the optimal control/subsolution approach. The 
discontinuous dynamics of the queuing process near the boundaries of its state 
space (i.e., when few customers remain in some of the nodes) makes the IS 
construction problem for queuing networks difficult [121 [8]. This property 
rules out iid sampling distributions (such as those developed in [T7] in the 
context of a random walk on the real line and in [11] in the context of two 
tandem Jackson nodes) as candidates for efficient IS samplers and forces one to 
search for a good IS distribution among dynamic distributions, where indeed 
the subsolution approach locates the optimal IS distributions. For a more in 
depth discussion of these issues we refer the reader to [121 EH El E] . 

2 Setup 

We consider Jackson networks with a tree topology. Customers arrive only 
at the root of the tree. Our goal is to construct optimal IS algorithms to 
estimate the following probability: 

Po (system experiences an overflow before it empties). (1) 

This overflow event depends on the buffer structure of the network, which will 
be made precise in subsection 12.21 For the computation of po it is enough 
to consider the embedded discrete time random walk of the Jackson network. 
The normalized service and arrival rates and the routing probabilities of the 
Jackson network are the jump probabilities of the embedded random walk. 

2.1 Notation and Definitions 

The tree consists of d nodes. X(i) is the population of i th node at the jump 
times in the network, i — ► j denotes that node j is a child of node i. For 
i — > j, jJLij > is the rate at which customers are served in node i and are 
either [sent to node j if j > 0] or [ leave the system if j = 0] . 

Total service rate at node i is defined as /i, = pi^. Arrival rate to node 
Aj at node j equals A if j is the root node. Otherwise it equals Aj = Aj^ 2 - 
where node i is the parent of node j. It is no loss of generality to assume that 
A + Yli=i Mi equals 1 ; otherwise one can change the time unit so that the 
equality holds. The utility of node i is defined as: pi = Aj//Xj. The Jackson 
network is called stable if pi < 1 for all i 6 {1, 2, d}. Therefore we assume 
that yf = iPi < 1. This stability assumption implies that the buffer overflow 
events of interest we study in the present paper decay exponentially in n (see 
pup and (flU]) ). Asymptotic optimality of an IS algorithm is stated in terms 
of this exponential decay (see Section [3]). 
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The evolution of the random walk X takes place in the state space Z^_. 
This set has 2 d — 2 different boundaries: c\ = {x = (x±,X2, x^) G : X{ = 

0}, i G {1,2, ...,d}, <9{n,i 2 ,...,j fc } = flLi d iv {h,h, -,*fc} C {1, 2, d}. As we 
have remarked earlier the dynamics of X depends on whether X is on one of 
these boundaries and if so it further depends on which one. We will find it 
convenient to identify these boundaries with bitmaps b € {0, l} d . b describes 
the following state of the network: b(i) = signifies that node i is empty, 
b(i) = 1 signifies that it is non-empty. Define Uo,i = (1,0, ...,0) and 

V2 = {vi,j,i,j G {1,2, ...,d} : i -> j, 

= _1 > = !> = 0, fc G {1,2, ...,d} - {i,j}} 

V 3 = Ko, » G {1, 2, d} : Ui,o(») = -1, v i>0 (k) = 0, G {1, 2, d} - {f}} 

Let V = {^0,1} U V2 U V3. V are the set of all possible jumps the process X can 
make, vqi corresponds to a new customer arriving at the root node, Vij G V2 
corresponds to server i serving a customer in queue i and sending it to queue j 
with i — > j, and finally 1^0 G V3 corresponds to a customer leaving the system 
after being served by server i. 

Let Y = {Yfc : k = 0, 1, 2, ...} be an iid sequence such that P^Yfe = i>o,i) = 
p(vo,i) = A, Pa;(Yfe = Vij) = p(vij) = (jLij for Vij G V 2 , P x (Yk = v i>0 ) = 
p(vi,o) = Mi,0 f° r u «,o G V3, for all x G T,\. Y k are the unconstrained increments 
of the process X. We assume the existence of a probability space (^l,^) 
equipped with the probability distributions P x . The subscript x denotes the 
initial position of the queuing system Xq: under P x , Xq = x almost surely. 

X £ dri 1} i 2 _ i k \ if the Jackson network has no customers in queues i±, 
Z2,...,and ifc. Therefore vij, j G {0, 1, 2, .., d}, I G 12, ife}, cannot be an 
increment of X when X 6 ^{i 1 ,j 2 , ...,i fe }- The constraining map tt : x V — > 
V U {0} will make sure that this does not happen: 

. . I 0, if x G di for some i G {1, 2, d} and (i>, n^) < 0, 
I otherwise, 

where Hi is normal to the boundary df. nj(i) = 1 and rii(j) = for j ^ i. X 
can now be written as 

Xk+i = X k + ir(X k ,Y k ). (2) 

Xq is the initial state of the system and under P x it equals x G T,\ almost 
surely. 

2.2 Overflow event of interest 

We would like to develop IS algorithms to estimate ([I]). We now define what 
we mean by an overflow. Let d+ = {x £ : Vix(i) = 1}. 
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Assumption 1. The system has a buffer whose structure is determined by 
a normalized exit set S C [0, l] d with the following properties: 1) S is closed 
and connected, 2) ^ S, 3) Any continuous curve in [0, l] d that contains 
and a point from d+ must also contain a point from S. 4) For S n = {x G : 
x/n G S}, 

7 = lim -- log PJX hits S n before 0) (3) 
exists and is nonzero. 

In this article we are interested in two types of buffer structures: 1) S\ = 
{x G : x(l) + x{2) + ••• + x(d) = 1}. S n = {x G 1,% : x/n G Si} 
corresponds to a single buffer of size n shared by all queues. For (3 G 
1S2 = { x £ 1^+ : x(i) = /3(i) for some i and x(j) < for all j}. Then 

= G 2^ : x/n G £2} corresponds to d independent buffers, one for each 
node. The size of the buffer for node i is given by n(5{i). Without loss of 
generality we will assume that Vj/3(z) = 1. 

Define the initial point s = (1, 0, 0, 0, . . . , 0). Fix a buffer structure S and 
define the exit boundaries S n as above. We now rewrite the exit probability 
of interest precisely as: p n = P S {X hits S n before it hits 0). We consider the 
case S = S\ (all nodes share a single buffer) in Section H] and the case S = S2 
(one buffer for each node) in Section [6l 

3 Importance Sampling 

In order to simulate X using importance sampling one specifies a sampling 
distribution p(v\x), v G V and x G Z + and simulates X from this distribution. 
Note that we allow p to depend on x, the current position of X. Define A n to 
be the set of sample paths that hit the exit set S n before and let T n denote 
the first time X hits S n or 0. The IS estimator of p n using K sample paths is 
then: 

k=l i=l ) 

where X k denotes the k th independent sample path used in the simulation. 
The increments {Y k } are iid copies of the increment process Y sampled from 
p. X k is built along with Y k using the dynamics ([2]). The product is the 
likelihood ratio of P s and P, which appears in the estimator to cancel off the 
effect of changing the sampling distribution from p to p. 

Pn = P n is an unbiased estimator of p n and therefore the variance of p n 
depends on the sampling distribution only through the second moment of p n . 
Because p n decays exponentially, one would like the second moment of p n to 
decay exponentially as well. However, Jensen's inequality implies that 

1 2 

lim sup log E[p^] < lim sup log E[p n ] = 2j. 

71 71 n Tl 
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In other words, the exponential decay rate of the second moment can be at 
most twice that of the probability. The IS estimator is said to be asymptoti- 
cally optimal if the upper bound is achieved, i.e., if liminf n — ^ logEfji 2 ] > 27. 

3.1 Definitions from the subsolution approach 

In this subsection we will give only the definitions from the subsolution ap- 
proach that we need to present the results and the algorithm for the tree 
Jackson networks. A full development of the subsolution approach ideas can 
be found in |T2] . 

Hamiltonians, the limit HJB equation and the boundary conditions. 

For a bitmap b E {0, l} d and q E M d define 

N b (q) = Ae-"«/ 2 + Y,^ e3h} ^ m+ Yl ^oe^+ Yl 

i:b(i)=l i:b(i)=l i:b(i)=0 

H b (q) = -2 log N b (q). (5) 

H b is the Hamiltonian associated with boundary b. We denote H b by H if 
b= (1,1,1,..., 1,1). 

For x E M+, define b x E {0, l} d as follows: 

JO, ifz(i) = 0, 
bx(i) = < . (6) 

II, otherwise. 

b x indicates which boundary x is on (if b x = (1,1,..., 1, 1) then x is in the 
interior of R^_). 

Definition of a subsolution. The limit HJB equation and its boundary 
conditions that are in the center of the subsolution approach are as follows: 

H(DV(x)) = 0, H bx (DV(x))=0, (7) 

where DV denotes the gradient of V . A subsolution to ([7]) is defined as follows: 

Definition 3.1. V is an e-subsolution to ((Jj) if it is C^IR^IR) and 

(a) H bx (DV(x))>-e for all xERi, 

(b) F(0)>2 7 -e, 

(c) V(x) <e,x E S, 

where 7 is the decay rate associated with the buffer structure S. 
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For and bitmap b define the jump probabilities: 



A 



cxp(-q(j)/2) 



N b (q) ' 
„ cxp((g(i)-g(j))/2) 
Vl.J N b (q) 
.. exp(g(t)/2) 
nfl N b (q) ' 



i = 0, j = 1 
i ^ o,6(i) = l,i - 
i / 0, 6(i) = 1 
i ^ 0, 6(i) = 0,» 



(8) 



J or j = 0. 



Any smooth function W : M. d — > 
as follows: 

Pw(v\x) 



can be used to define a stochastic kernel p 
-- pt(v\DW(x/n)), (9) 



where DW is the gradient of W. 

Theorem 4.1.1 of [TB] asserts that the IS transition kernel defined by 
smooth subsolutions to (|7|) satisfying growth conditions on their Hessians are 
asymptotically optimal. For completeness we quote this theorem below. 

Theorem 3.1 (Theorem 4.1.1 of [16]). Let {V n } be a sequence o/C 2 ([0, l] d ,R) 

d 2 v n 



functions that satisfy 1) V n is a e n -subsolution 2) q x .q^.. < j~ f or ^j ^ 
{1, 2, d}, for some fixed constant C < oo and a pair of non negative se- 
quences {5 n } and {e n } that converge to and satisfy n5 n — ► oo. Then the IS 
scheme defined by the subsolutions V n is asymptotically optimal. 

In the next section we will construct a sequence of smooth subsolutions to 
(J7|) that satisfy the conditions of this theorem by piecing together at most 2 d 
affine functions for the buffer structure <Si. We will find out in Section O that 
the same sequence also works for 52 (one individual buffer for each node). 



4 Single shared buffer 

In this section we will be working with S = S\ = {x G : x(l) + x{2) + • • • + 
x(d) = 1}. As noted before, S\ corresponds to a single buffer shared by all 
queues in the system. To remind the reader, we are interested in the overflow 
probability: p n = P S (X hits S n before it hits 0), where and S n = {x G 7L\ : 
x/n G Si}. It is proved in [8] that 

lim -- logp n = 7i = min - log p { . (10) 

n— »oo n i 

In particular, this implies that S\ satisfies the conditions of Assumption [TJ 



4.1 The smooth subsolution 

We define the following quantities to write down the subsolution to ([7]) that 
we have in mind. 
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The effective rate Mj(6) of node i at boundary b. 




min (jn, Y^k-.i^k M k(b) + K, ) , 



if 6(f) = 1, 
if b(i) = 0, 



(11) 



where a'- n = Aj^ 2 - is the traffic that leaves the system through node i. The 
recursive formula (jlip is the main ingridient of our construction and is sug- 
gested by the definition of the Hamiltonians ([5]) and the HJB equation ([7]) to 
which we are constructing a subsolution. The form of (jlip and the role Mj(6) 
plays in the solution to the problem suggests the following interpretation of 
pip , pip seems to compute an "effective" service rate for each node taking 
into account whether the node is empty or nonempty. If a node is nonempty 
its effective service rate is simply its service rate. If the node is empty, pip 
seems to consider it as a system whose components are the nodes it directly 
feeds and computes the effective rate as the total effective rates of the compo- 
nents. There is also an upper bound on the effective rate, namely the service 
rate and if the aforementioned total exceeds this bound then again the effec- 
tive rate is set to be the service rate. In this interpretation can be thought 
of as the effective rate of outside of the network for the empty node i. 

The effective utility pi(b) = ■ The effective utility of a node is the 
ratio of its arrival rate to its effective service rate. If node i is nonempty then 
it coincides with the ordinary utility pi. 

The effective gradient q € K rf associated with boundary b. 



where q(i) denotes the i component of the vector q. We will use the affine 
functions defined by the effective gradients to construct our subsolution of . 
The effective gradient q of the boundary b will be the gradient of the smooth 
subsolution around that boundary. 

For each boundary b there is an effective gradient q. It may happen that 
two boundaries b\ and bi have the same effective gradients. Let EG = {qi, 
Q2,---,Ql}, L < 2 d , be the set of unique effective gradients. We identify two 
extreme elements of the set EG: firstly, the effective gradient corresponding to 
the boundary = (0, 0, 0, . . . , 0, 0) (all nodes empty) is = (0, 0, 0, ... , 0, 0) 
(this follows from pip and the definition of p>' i0 ). Secondly, the effective 
gradient corresponding to the boundary 1 = (1, 1, 1, . . . , 1, 1) (all nodes non- 
empty) is the vector whose i th component is logAj//Xj. 

Now define 



qii) = 2 log Pi (b) =2 log 



(12) 



Mi (b) 




(13) 
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The simple gradient q = (qi, q%, qj) associated with boundary b is defined as 
q(i) = 2 log ^rfa where as before Aj is the arrival rate to node i. The following 
lemma relates simple and effective gradients. Bitmaps b' and b satisfy b' > b 
if b'{i) > b(i) for all i € {1, 2, 3, d}. 

Lemma 4.1. Let q be the effective gradient associated with boundary b. Then 
there exists a boundary b > b such that q is the simple gradient associated with 
b. 

Proof. If b = (1, 1, 1, 1, 1) then there is nothing to prove because for this 
boundary the effective gradient and the simple gradient are the same. Then we 
assume that there are some empty nodes indicated by b. b > b is constructed 
as follows. Initially set b = b. For each empty node i in b set bi to 1 if 
Mj(6) = (see (jlip ). It is clear that 1) b > b and 2) the effective and simple 
gradients of b are the same vector which is the effective gradient of b. □ 

Definition 4.1. For an effective gradient qi € EG let b be the boundary whose 
simple gradient equals q\. Define ol\ to be the number o/0's in b plus 1. 

The a/'s will determine the size of the regions where the change of measure 
defined by q\ is used for IS. Now define the piecewise affine subsolution 

L 

W l e (x) = 2 11 -aie + (q h x), W e (x) = /\W[(x), (14) 

i=i 

where L is the number of effective gradients and qi are the effective gradients. 
W £ is piecewise affine and not smooth in general. To obtain the sequence of 
smooth subsolutions satisfying the assumptions of Theorem 4.1.1 of |16j one 
has to let e depend on n and then smooth W e . One smoothing method that 
is simple and easy to implement on a computer is the following [6]. Define 

W^ s (x) = -<51og^exp j-Vf(x)j . (15) 
i=i ^ J 

This smoothing algorithm is based on the following fact: For d real numbers 
ai, a 2 a d : - lim^o 5 log (Yh=i e~ a * /<5 ) = Af=i a i- B Y Lemma 3.12 of 
|12j . W e ' 5 — > W e uniformly as e — > 0. In addition, W e ' S is continuously 
differentiate and a simple direct calculation gives 



DW (x) =2^w{ (x)q h w{ (x) = — z . (16) 



Lemma 4.2. W e ' S defined in (jl5l) satisfies: 
1. fl- 6B (DW e ' 4 (x))>-Ciexp(-f), 
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2. W*> s (0) > 2 7 i - 6 (f log Eti exp {^}) , 

3. W e > s (x) < for x G S\, 



d 2 W e 



— 5 ' 



where C\ and C2 are constants that only depend on the parameters of the 
network (arrival and service rates and the routing probabilities). 

The proof of Lemma 14.21 is in Appendix EJ This lemma directly implies 
that, for e n = — 5 n logS n and 5 n chosen such that 5 n — > and n5 n — ► 00, 
the sequence of smooth subsolutions W €n,Sn (where W e ' 5 is defined as in (I15|) ) 
satisfy the conditions of the optimality Theorem 4.1.1 [16] . This means that 
the IS scheme defined by these subsolutions through ([9]) is asymptotically 
optimal. 

Here we repeat an idea from [6j [12]. The formula ([9]) can be used to 
translate any smooth function into an IS transition kernel. However, for the 
smooth subsolutions there is a slightly different way of defining IS transition 
kernels which turn out to be very convenient in computer simulations. 

For x G define 



£ 

1=1 



w e { S {x/n)pl x (qi){vij), (17) 



i.e., we switch the order of taking the average against the weights w^' S and 
applying the map pi (•) of ©. The advantage of p* of (fT7|) is that it requires 
the computation of Pl(qi) only once at the beginning of the estimation pro- 
cedure. During the simulation only the weights are computed dynamically 
and averages of the precomputed Pl(qi) will be the IS rates. Theorem 4.1.1 of 
|16j doesn't cover this way of computing the IS rates. However, the modifi- 
cation of this theorem to accommodate direct averaging entails no significant 
changes. In the next section we report on the numerical performance of these 
algorithms. 



4.2 Interpretation of the IS algorithm defined by the subsolu- 
tion 

Let b a boundary and q its effective gradient. (fTT|) essentially uses pb(q) as 
the IS change of measure when the queueing process is on the boundary b and 
away from the lower dimensional boundaries contained in b. Looking at ([121) 
and ([8]) one sees that pb(q) is simply the following change of measure: 

f/Xij, if node i is empty, 

- \ Pi (b) ., , . . (loj 
I j ^-Wy , if node 1 is nonempty, 



10 



where Pi(b) and Pj(b) are the effective utilities of nodes i and j. These new 
rates are renormalized so that they sum to 1. By convention po(b) = 1, i.e., 
the outside of the system is thought of as a node with utility 1. The IS scheme 
given by (fT7|) uses a convex combination of (fT8l) when the simulated queuing 
process transitions from one boundary to another. 

(j!8j) illustrates well how the IS change of measure given by the subsolution 
approach works. In the course of a simulation, the IS change of measure 
depends on which nodes are currently empty and nonempty. The service 
probabilities of empty nodes are not modified. The service probability pij 
of a nonempty node i is modified through a comparison of the traffic at the 
source i and the target j; the service rate is increased if the source is busier, 
decreased otherwise. The goal seems to be to direct traffic to the less strained 
node. The traffic is measured by the effective utilities. For an empty node 
the effective utility is a value that takes into account the traffic in the nodes 
that follow it immediately. We also note that the arrival rate A is replaced by 
A = A which is always larger than A. Therefore the rate of traffic from 
outside is always increased. Similarly, the rate of traffic to outside is always 
decreased. 

We would like to also note that the standard state independent heuristic 
IS algorithms based on large deviations results can be thought of as variants 
of (I18p in which the standard utilities are used instead of the effective utilities. 

5 Numerical Results 

Choice of e and 6. The IS algorithm defined by W e ' S of (fT5|) has two 
parameters e and 5. The optimality Theorem 13.11 suggest 5 ~ C/n and e ~ 
—8 log 5. Asymptotic optimality criterion is not precise enough to impose a 
value for C. For the choice of this constant we used experimental evidence. 

Once e and 5 are fixed, p*(v\x) of (fT7|) is used as the IS change of measure. 
The effective gradients qx, q^ and their cfy's are computed by iterating 
over all boundaries b and computing the effective gradient of each of them 
using the formulas (jllj) and (I12p and the Definition 14.11 

In the following subsections we present simulation results for various Jack- 
son networks with a tree topology. In all the estimations K = 10000 sample 
paths were used. 

Example 1 . We first consider the network in Figure HJ Let us consider 
the case when A = 0.04, = pi$ = 0.12, //2,o = A*2,3 = ^2,4 = 0.08, 
^3,0 — ^3,1 — /M,o = ^4,1 = 0.12. The node utilities in this case are: p\ = 
1/6, pi = 1/12, pz = Pa = 1/36. In this example, the utilities are unevenly 
distributed and node 1 is the most strained node. We take n = 30. For 
n = 30, and with this four dimensional system, it is possible to compute 
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Figure 1: Example 1 



Exact probability p 30 = 3.269 X 10 -23 





Estimate p n 


Standard Error 


95 % CI 


Est. 1 


3.50 X 10~ 23 


0.19 X 10~ 23 


[3.12, 3.88] X 10~ 23 


Est. 2 


3.22 x 10~ 23 


0.16 X 10~ 23 


[2.89, 3.54] x 10 _i!3 


Est. 3 


3.28 X 10~ 23 


0.17 X 10~' j!is 


[2.94, 3.61] X 10~ 23 


Est. 4 


3.32 x 10~ 23 


0.17 X 10~ 23 


[2.98, 3.66] X 10~ 23 


Est. 5 


3.16 X 10~ 23 


0.16 X 10 _a3 


[2.84, 3.48] X 10 _i!3 



Table 1: Simulation Results for Example 1 



P30 without any simulation using the Markov property and straight-forward 
iteration. Such a computation yields p^o = 3.269 x 1CP 23 . For the subsolution 
based IS algorithm we take e = 0.25 and 5 = 0.08. There turns out to 
be only five effective gradients for the given rate values above. The results 
of five consecutive estimations using the subsolution based IS algorithm are 
displayed in Table [U The 'standard error' column is the standard error of 
each estimation. The 95% confidence intervals are p n + [— 2SE, 2SE], where 
SE is the standard error displayed under the standard error column . These 
intervals are only formal, i.e., we make no assertion about the normality of 
these errors. Note that the estimation results are very close to the exact value 
and the "95% confidence intervals" are accurate: in all these estimations the 
exact value happened to be in the computed confidence interval. In total all 
five estimations took around 20 seconds on an ordinary laptop manufactured 
in 2004. 

Example 2. Now we look at the 8- node network depicted in Figure [2j We 




Figure 2: Example 3 
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Estimate p n 


Standard Error 


95 % CI 


Est. 1 


1.11 x 10~ H 


U.17 X 10~ B 


[0.78, 1.44] x ur° 


Est. 2 


1.69 X 10 _0 


0.32 X 10~ B 


[1.04, 2.34] x 10 -0 


Est. 3 


1.25 x io -0 


0.18 X 10~ B 


[0.89, 1.61] X 10 - " 


Est. 4 


1.94 x 10~ H 


0.51 X 10~ B 


[0.92, 2.97] x 10~ B 


Est. 5 


1.23 X 10~ H 


0.17 x 10~ B 


[0.89, 1.56] X 10 _0 



Table 2: Simulation results for the network with eight nodes 

take the arrival rate A = 0.1248, The service rates are taken to be: /ii2 = 
0.062442, fx 1)3 = 0.1874, ^ 1)4 = 0.062442 /x 1|0 = 0.062517 p 2 ,o = 0.06, /x 3 ' = 
0.036, ^ 3j5 = 0.072, ^ 3>6 = 0.072, /m,o = 0.03, p 4 j = 0.03, p 5fi = 0.0365, 
^5 )8 = 0.0365, p efi = 0.073, p 7fi = 0.025, p 8 ,o = 0.028. For this choice of the 
network parameters, the utility of each node turns out to be approximately: 
pi = 0.331738, p 2 = 0.3465, p 3 = 0.3466, p A = 0.3465 , p 5 = 0.3419, p 6 = 
0.3466, P4 = 0.3465, p$ = 0.4158. All nodes are similarly utilized, although the 
load on node 8 is slightly heavier then the rest. A straightforward simulation 
with 10 8 samples estimate p^Q to be 1.2 x 10 -6 with a standard error of 
1.1 x 10 -6 . The subsolution based IS simulation results are given in Table 
O The parameters of the algorithm are taken to be e = 0.4 and 5 = 0.1. 
Each estimation uses 10000 samples. For this network there are 256 effective 
gradients. Total run time for all these five estimations was about 20 minutes. 

As can be seen, the subsolution based IS algorithm performs very well for 
this high dimensional system too: the estimate is within the 95% confidence 
interval of the MC estimator and the formal 95% confidence inervals of the IS 
simulation do not wildly fluctuate. 

6 Individual Buffers for each Node 

In this section we look at the buffer structure £2: for E 

<?2 = {x € : x(i) = for some i and x(j) < (3{j) for all j}. 

As we noted before, S n = {x G : x/n 6 £2} corresponds to d independent 
buffers, one for each node. The size of the buffer for node i is given by nj3(i). 
Without loss of generality we will assume that Vif5(i) = 1. We are, as before, 
interested in: p n = P S (X hits S n before it hits 0), where s = (1, 0, 0, . . . , 0). 
One can prove, using arguments similar to those in [8] that 

lim -- log p n = 72 = mm -/3(i) log pi, (19) 

n— >oo n i 

where pi are the node utilities. In particular, this implies that £2 satisfies the 
conditions of Assumption [TJ Our goal now is to prove that the IS algorithm 
defined by W e " ,Sn is asymptotically optimal for the buffer structure £2 as well 
(when buffer structure is changed to 52, 71 in (fT4"|) needs to be replaced with 
72). To prove this, it is enough to prove a version of Lemma 14.21 for 52- Note 
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Exact probability pig = 6.8601 x 10 





Estimate p n 


Standard Error 


95 % CI 


Est. 1 


7.33 X 10 _a 


0.42 X 10~ !J 


[6.50, 8.17] x 10 _a 


Est. 2 


6.81 X 10 _a 


0.34 X 10~' J 


[6.12, 7.50] X lO - '- 1 


Est. 3 


7.30 x 10 _a 


0.38 x 10~' J 


[6.53, 8.06] X 10~' J 


Est. 4 


7.05 x 10 -a 


0.39 X 10~ !J 


[6.28, 7.83] X 10~ a 


Est. 5 


7.01 x 10 _a 


0.37 X 10~' J 


[6.26, 7.76] X 10 _a 



Table 3: Simulation results for the case when each node has a separate buffer 

that only item 3 of this lemma depends on S and therefore we only have to 
prove that the same item holds for 52, which is done in the next lemma. 

Lemma 6.1. Define Wf' e (x) = 272 — a/e+ (qi,x), where ai and qi are defined 
as in (I12p and Definition ^. 1\ and 72 is the large deviation rate associated with 
the boundary S 2 (|19|) . Define W e,s by the expression (US]). Then: W e ' S (x) < 0. 
for x € £2 • 

Proof. Take any x £ £2- Then, there is an i < d such that x(i) = (3{i). Let 
qL be the effective gradient of the boundary 1 = (1, 1, 1, . . . , 1, 1). 

W(x) = -Jlog^exp I -^(272 - one + (qi,x)) \ < 2j 2 + (qi,x) - a L e. 
1=1 ^ J 

By definition, qL{i) = 2 log ^ and the rest of the components of qi are nega- 
tive. These facts, (fT9|) . x E and x(i) = (3{i) imply that the last display is 
less than — a^e. This finishes the proof of this lemma. □ 

Numerical example Consider a network with five nodes with the following 
service rates: = 0.038, ^1,3 = 0.057, flip = 0.095, 112,4 = 0.076, ^2,0 = 
0.114, /i 3)5 = 0.095, /x 3)0 = 0.095, /i 4 ,o = 0.19, '/x 5)0 = 0.19 and A = 0.1. We will 
suppose that the buffer sizes for the nodes are respectively: 15, 15, 17, 18, 19 
Then n = 19 and (5{l) = 0(2) = 15/19, /?(3) = 17/19, 0(4) = 18/19, /?(5) = 1. 
The choice of the buffer sizes are rather arbitrary. We chose them relatively 
small so that it was possible to compute the buffer overflow probability pig 
using the Markov property and direct iteration. The exact value of pig turns 
out to be P19 = 6.8601 x 10~ 9 . 

The relative node utilities are: 0(1) pi = 0.208, 0(2)p 2 = 0.042, (3(3)p 3 = 
0.013, 0(4)p4 = 0.0004, P(S)ps = 0.0008. Node 1 is clearly the most strained 
node and the loads on the rest of the nodes are spread. Following the same 
reasoning as in Section [5] we take e = 0.3 and 5 = 0.1. The IS simulation 
now proceeds as before. One uses p(-\x) = p*(-\x) given in (fT7|) for the IS 
change of measure. There turns out to be only eight effective gradients (out 
of a maximum of 32). The results of five consecutive estimations using the 
subsolution based IS algorithm are displayed in Table [3j Once again, the 
estimation results are close to the exact value pig = 6.8601 x 10~ 9 and the 
formal 95% confidence intervals are tight and happen to contain the exact 
value. 
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7 Discussion 



The goal of the present paper was to extend the IS algorithms in [12], which 
looked at tandem Jackson networks, to more general networks. We thought 
tree networks were an interesting generalization and a comparison with the 
algorithms in [p2] will reveal that the tree networks require much more sophis- 
ticated subsolutions and IS algorithms for asymptotic optimality. [7] proves 
a further generalization to arbitrary stable Jackson networks. In this section 
we would like to discuss how the results in [7J relate to our results. 

Let pij = Hij/ni denote the routing probability from node i to j, where 
j is allowed to take the value 0. In the notation of the present paper, the IS 
algorithm in [7J can be described as follows. Define the effective rate for the 
boundary b as: 

{iii, if b(i) = 1, 

As before if a node is nonempty under b, i.e., b(i) = 0, then its effective rate 
is just the service rate fj,{. If it is empty, one now takes a weighted sum of 
the effective rates of its neighbors, as before this sum is min'ed with //j. The 
weight of Mf.(b) is the fraction of the k th node's traffic in the fluid model that 
is coming from node i. This fraction is always 1 for a tree network and thus 
for such networks (i20|) reduces to (fTTj) . Once the effective rates are defined as 
above one proceeds as in subsection 14.11 

We note that (llip is a recursive formula: one can start from the leaves of 
the network and go up and compute all effective gradients using (|11|) . In the 
case of general Jackson networks (|20p is an equation that needs to be solved; 
as observed in [7J, it can be solved by reducing it to a linear equation, which 
is a generalization of (I13p . It can also be directly solved using (|20f) itself and 
an iterative method. 

Another contribution of [7J is the identification of the large deviation decay 
rate 7 of p n for any exit boundary S for which such a rate exists. In the 
notation of the present paper, [TJ Proposition 3.1] asserts that 

7 = in f -(q,x) 

x€S 

where q is the effective or simple gradient of b = (1, 1, 1, . . . , 1). As noted in 
[7J this implies that the IS change of measure given by (|20p . or (jllH for the 
case of tree networks, is asymptotically optimal for any buffer structure S for 
which there is a large deviation decay rate. 

Finally, we would like to point out a parametrization that seems most 
natural for pp]) . Define M; = 1/pi and Mi (6) = Mj(6)/Ai. The first is the 
ordinary service to arrival ratio of node i. The second can be thought of as 
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the effective service to arrival ratio of the same node when the system is on 
boundary b. By convention let Mo (b) = 1, i.e., the service to arrival ratio of 
the outside of the system is 1. In terms of these new variables (|2Up is simply: 

M,(i,,= { M " if " (!) = 1 PU 

\mm(M„£ fcf ^Rj ! M t (l>)), if f,(i) = 0, 

where k = value is allowed in the summation to denote the outside of the 
system. If node i is empty, its effective service to arrival ratio is taken to be 
the average of the effective ratios of the nodes that are directly connected to 
i. The average is taken with respect to the routing probabilities. As before 
the ordinary service to arrival ratio is an upperbound on the effective one. So 
if the average exceeds the ordinary, the effective ratio is set to the ordinary 
ratio. 

The effective gradient q for b will have components — 2 log Mj(6). And the 
change of measure pb(q) is: 

{Hij, if node i is empty 

Vij^M, ^ node * is nonempty, 

and this is renormalized so that /Zj j sum to 1. One can use (|21jl directly to 
compute the IS algorithm. 

A Proof of Lemma 14.21 

Before we begin, a convention: the decay rate 7 depends on the buffer struc- 
ture. We used 71 for the shared buffer (Si) and 72 for the individual buffers 
for each node (£2). In the proofs we will simply write 7. 

Lemma A.l. Let q be the simple gradient associated with boundary b. Then 
Hi{q) = for any b > b. 

Proof. We first prove that Hb(q) = 0, or equivalently Nb(q) = 1. Directly 
from the definitions ©, one sees that Nb(q) = 1 if and only if 

I] m J'0)+/<o +mi(6) = A+ Y 

i:6(i)=l \j ' -J J t:6(i)=l 

The definition of directly imply that Yli=i K = ^- ^ e a bove display 
follows from this fact and (]13p . 

Next fix a b > b. We will show that iVj(g) = 1. 

N- b (q) - N b (q) = ^ ^ + Y - Y * 

i:b(i)-b(i)=l,i-*j t:6(i)-6(t)=l i:b(i)-b(i)=l 

(22) 
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Fix i such that b(i) — b(i) = 1 and let C denote the terms contributed by the 
index i in the first two sums. Our goal is now to show that C = fa. This will 
imply that first two sums and the last sum in (|22|) cancel each other and that 
Ni(q) = Nb(q). Because b(i) = we have that 

mi (b)= mj(b)+ /4, . (23) 

Then 

^ n(iM'> \ gw-g(j) Aj \ - A< mAb) 

C = M.oe""' 2 + £ w - = /«,°^ + £ "-^^ 

3-*-*3 

At this point the facts A,- = Aj^. anc i Bi^hi = J anc [ flCT and simple 

J fj,^ fj,^ tj\ J 

arithmetic yield C = fa. Thus the difference in (j22|) is zero, i.e., N^(q) = 
Nf,(q) = 1. This finishes the proof of this lemma. □ 

Lemma A. 2. Let q be the effective gradient associated with boundary b. Then 
H h >(q) > for all b' > b. 

Proof. Ht,i(q) > if and only if Ny(q) < 1. By Lemma 14. 1 1 there exists b > b 
such that q is the simple gradient associated with b. Then by Lemma I A. II 
Ny (q) = 1 for all b' > b. Now take any b' such that b' < b and b' > b. Because 
b > b' > b we have 

N- b (q) - N b (q) 

^2 lHje tW ' 9(3) + Mi,oe 9(i)/2 J - ^ & 

/ i:5(i)-6(i)=l 

A, Mj-(6) , A, \ x . 




Mi (6) A, ^'' u Mi(6) 
,^(&)+/4o 



z:b(i)-b(i) = l 



Mi (6) 



i:6(i)-6(i)=l 



Now by the construction of b, b(i) — b(i) = if and only if Mj(6) = fa < 
Y.i^j M j( b ) +/4,0> The last display and flM} imply N~ b (q) > N b (q). Because 
Nf)(q) = 1 (because q is the simple gradient associated with boundary b) this 
finishes the proof of this lemma. □ 

Proof of Lemma The proof is this lemma is similar to the proof of Theo- 
rem 4.31 in [16] . For small positive real numbers 5, e let W e ' S be defined as in 
(|15p . For ease of notation we will drop the superscript (e, 5) and write W. We 
would like to prove the following: there is a constant C\ that only depends on 
the parameter system such that for all x £ Hf,(DW(x)) > —C\ exp(— e/5), 



17 



where b defined in ([6]) is the boundary corresponding to x. Let E be the set 
of effective gradients q such that there is a boundary b' < b with effective 
gradient q. Define q' = ^2 q e E'Wi' S (x)qi, where w^' 6 are the weights defined in 
(11611 . Once again to ease notation, we drop the superscript (e, 5). Its definition 
directly implies that H b is concave and Lipschitz continuous. By Lemma lA.21 
we have that H b (q) > for q £ E. This fact and the concavity of H b and 
H&(0) = imply that H b (q') > 0. This, (fTBj) and the Lipschitz continuity of 
H b give 

H b (DW(x)) = H b (q') + H b (DW(x)) - H b {q') > \H b (DW(x)) - H b (q')\ 
> K\q' - DW{x)\ = -K M X )W\- 

The last inequality follows from (|16p and the triangle inequality. Therefore to 
prove the first part of Lemma 14.21 it is enough to prove wi(x) < exp(— e/6), 
for I such that q\ £ E c . 

By its definition (fT6|) w\ equals 

exp{-W l e (x)/5} exp{(a ; e- (q h x))/5) 



wi(x) 



Ei=i ex p{-^7 (x)/6) Ej=i ex p{K £ - (ij^))/s} 

K exp{(q;e- {q u x))/8} 
~ exp{(a J0 e- (q jo ,x})/6}' 



where qj is an effective gradient to be selected. By Definition 14.11 ai is one 
plus the number of 0's in the the boundary (bitmap) r whose simple gradient 
equals qi. Form the bitmap f from r as follows: if r(i) = 1 but b x (i) = 
then set r(i) = otherwise set r(i) = r(i). By this construction f < b x and 
f < r. The last inequality is strict, because otherwise we would have b x = r 
which would imply, by Lemma \A.2\ H bx {qi) > which in turn contradicts 
qi £ E. Let qj Q be the effective gradient associated with the bitmap f. r < b x 
and Lemma IA.2I imply that H bx {qj ) > 0. This implies that q~ € E and 
consequently qj qi £ E c . These facts and the strict inequality f < r imply 
that aj — ai ^ 1- 

Furthermore, remember x is such that xi = if b x (i) = 0. The bitmaps 
r and f differ only at such i. Then the effective gradients of these bitmaps, 
namely qi and qj will also differ only at such i. This means (qi,x) = (qj ,x). 
These considerations and ([25]) imply wi(x) < exp(— e/S) and hence the first 



part of Lemma | 
By its definition 



W (0) = -5 log Y exp { - } = 2 7 - e ^ log ^ exp [ ^ } 

This proves the second part of Lemma 14.21 
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Now let us prove the third part. Let qi be the effective gradient of the 
boundary 1 = (1, 1, 1, . . . , 1, 1). For x 6 with x\ + X2 + ■ • • + x^ = 1 we 
have the following estimate: 

W(x) = -51og^exp |-^( 2 7 - a«e + j < (ql,x) + 2j - a L e. 

By definition qL{i) = 2 log j^. This and ()10p imply that the last line is less 
than — «l£. This finishes the proof of the third part of Lemma 14.21 It only 
remains to prove the last part. Differentiating the first expression in (jl6[> 
gi ves: dx^dx ( x ) = ^2d=i §^r( x ) ( &W- Differentiating the second expression in 

CGI § ives: 15xj( x ) = \ w ii x ) (Ylk=i w k( x )(qk(j) -qi(j))j ■ These imply the 
bound in part 4 of Lemma 14.21 which is what we wanted to prove. □ 
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